Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 22, 2025

📄 18% (0.18x) speedup for AmazonBedrockEmbeddingFunction.build_from_config in chromadb/utils/embedding_functions/amazon_bedrock_embedding_function.py

⏱️ Runtime : 542 microseconds 460 microseconds (best of 60 runs)

📝 Explanation and details

The optimized code achieves a 17% speedup through several micro-optimizations that reduce Python bytecode operations:

Key optimizations:

  1. Faster attribute access: Replaced hasattr(session, attr) and session.attr with getattr(session, attr, None) for session args extraction. This eliminates duplicate attribute lookups and reduces opcodes.

  2. Optimized type checking: Added a type(value) not in _primitive_types check before the more expensive isinstance() call. For primitive types (the common case), this avoids the overhead of isinstance() entirely.

  3. Simplified session creation: Collapsed the if-else branch for session creation into a single conditional expression boto3.Session(**session_args) if session_args is not None else boto3.Session(), eliminating redundant branching.

  4. Import reorganization: Moved imports to follow PEP 8 style (stdlib before third-party), though this has minimal performance impact.

Performance characteristics from tests:

  • Large-scale kwargs (1000 items): 46% faster - the type checking optimization has significant impact with many kwargs
  • Mixed primitive types: 9.87% faster - benefits from the optimized type checking
  • Basic cases: 3-4% improvements from reduced attribute lookups
  • Some edge cases show minor regressions (5-8%) due to the additional tuple lookup, but these are outweighed by gains in common scenarios

The optimizations are most effective when dealing with many kwargs or repeated calls, as the reduced per-item overhead compounds.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 21 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import sys
import types
# function to test (copied from above, but only the relevant build_from_config function)
from typing import Any, Dict

# imports
import pytest  # used for our unit tests
from chromadb.utils.embedding_functions.amazon_bedrock_embedding_function import \
    AmazonBedrockEmbeddingFunction

# ---- UNIT TESTS ----

# Helper stub for boto3.Session
class DummySession:
    def __init__(self, **kwargs):
        self.kwargs = kwargs
        self.region_name = kwargs.get("region_name", None)
        self.profile_name = kwargs.get("profile_name", None)

# ---- Basic Test Cases ----









def test_edge_unexpected_kwargs_type():
    # Should allow only primitive types in kwargs (checked in __init__)
    class WeirdType: pass
    config = {
        "model_name": "test-model",
        "kwargs": {"bad": WeirdType()}
    }
    # Should raise ValueError from __init__ (simulate real behavior)
    # We'll monkeypatch __init__ to check for this
    orig_init = AmazonBedrockEmbeddingFunction.__init__
    def _init(self, session, model_name, **kwargs):
        for k, v in kwargs.items():
            if not isinstance(v, (str, int, float, bool, list, dict, tuple)):
                raise ValueError(f"Keyword argument {k} is not a primitive type")
        orig_init(self, session, model_name, **kwargs)
    AmazonBedrockEmbeddingFunction.__init__ = _init
    try:
        with pytest.raises(ValueError):
            AmazonBedrockEmbeddingFunction.build_from_config(config)
    finally:
        AmazonBedrockEmbeddingFunction.__init__ = orig_init





def test_edge_importerror(monkeypatch):
    # Simulate boto3 not installed
    monkeypatch.setitem(sys.modules, "boto3", None)
    config = {"model_name": "test-model"}
    with pytest.raises(ValueError) as excinfo:
        AmazonBedrockEmbeddingFunction.build_from_config(config) # 16.6μs -> 15.8μs (4.84% faster)

# ---- Large Scale Test Cases ----





#------------------------------------------------
import types

# imports
import pytest  # used for our unit tests
from chromadb.utils.embedding_functions.amazon_bedrock_embedding_function import \
    AmazonBedrockEmbeddingFunction


# Minimal stub for EmbeddingFunction and Documents/Embeddings
class EmbeddingFunction:
    pass
from chromadb.utils.embedding_functions.amazon_bedrock_embedding_function import \
    AmazonBedrockEmbeddingFunction

# --- Pytest fixtures and stubs ---

class DummySession:
    """A dummy session to simulate boto3.Session for testing."""
    def __init__(self, region_name=None, profile_name=None):
        self.region_name = region_name
        self.profile_name = profile_name

    def client(self, service_name, **kwargs):
        # Dummy client, not used in build_from_config
        return None

@pytest.fixture
def dummy_boto3(monkeypatch):
    """Monkeypatch import of boto3 to use DummySession."""
    dummy_module = types.SimpleNamespace()
    dummy_module.Session = DummySession
    monkeypatch.setitem(__import__('sys').modules, 'boto3', dummy_module)
    yield
    # Clean up
    del __import__('sys').modules['boto3']

# --- Unit tests ---

# 1. Basic Test Cases


def test_basic_with_session_args(dummy_boto3):
    """Test with session_args provided."""
    config = {
        "model_name": "amazon.titan-embed-text-v1",
        "session_args": {"region_name": "us-west-2", "profile_name": "test_profile"}
    }
    codeflash_output = AmazonBedrockEmbeddingFunction.build_from_config(config); ef = codeflash_output # 5.03μs -> 5.39μs (6.75% slower)

def test_basic_with_kwargs(dummy_boto3):
    """Test with additional primitive kwargs."""
    config = {
        "model_name": "amazon.titan-embed-text-v1",
        "kwargs": {"foo": "bar", "num": 42, "flag": True}
    }
    codeflash_output = AmazonBedrockEmbeddingFunction.build_from_config(config); ef = codeflash_output # 6.62μs -> 6.38μs (3.73% faster)

# 2. Edge Test Cases

def test_missing_model_name(dummy_boto3):
    """Test with missing model_name (should assert False)."""
    config = {}
    with pytest.raises(AssertionError) as excinfo:
        AmazonBedrockEmbeddingFunction.build_from_config(config) # 2.02μs -> 1.94μs (3.76% faster)

def test_empty_session_args(dummy_boto3):
    """Test with empty session_args dict."""
    config = {
        "model_name": "amazon.titan-embed-text-v1",
        "session_args": {}
    }
    codeflash_output = AmazonBedrockEmbeddingFunction.build_from_config(config); ef = codeflash_output # 4.77μs -> 5.19μs (8.00% slower)

def test_kwargs_with_various_types(dummy_boto3):
    """Test with kwargs containing all allowed primitive types."""
    config = {
        "model_name": "amazon.titan-embed-text-v1",
        "kwargs": {
            "str_val": "abc",
            "int_val": 1,
            "float_val": 2.3,
            "bool_val": False,
            "list_val": [1, 2],
            "dict_val": {"x": 1},
            "tuple_val": (1, 2)
        }
    }
    codeflash_output = AmazonBedrockEmbeddingFunction.build_from_config(config); ef = codeflash_output # 9.12μs -> 8.30μs (9.87% faster)

def test_kwargs_with_non_primitive(dummy_boto3):
    """Test with kwargs containing a non-primitive type (should raise ValueError in __init__)."""
    class NonPrimitive: pass
    config = {
        "model_name": "amazon.titan-embed-text-v1",
        "kwargs": {"bad": NonPrimitive()}
    }
    # Patch __init__ to raise as in original code
    orig_init = AmazonBedrockEmbeddingFunction.__init__
    def patched_init(self, session, model_name="amazon.titan-embed-text-v1", **kwargs):
        for key, value in kwargs.items():
            if not isinstance(value, (str, int, float, bool, list, dict, tuple)):
                raise ValueError(f"Keyword argument {key} is not a primitive type")
        orig_init(self, session, model_name, **kwargs)
    AmazonBedrockEmbeddingFunction.__init__ = patched_init
    with pytest.raises(ValueError) as excinfo:
        AmazonBedrockEmbeddingFunction.build_from_config(config) # 5.58μs -> 5.42μs (3.03% faster)
    # Restore original __init__
    AmazonBedrockEmbeddingFunction.__init__ = orig_init

def test_import_error(monkeypatch):
    """Test when boto3 is not installed (simulate ImportError)."""
    monkeypatch.setitem(__import__('sys').modules, 'boto3', None)
    config = {"model_name": "amazon.titan-embed-text-v1"}
    # Remove boto3 from sys.modules to simulate ImportError
    import sys
    sys.modules.pop('boto3', None)
    # Patch import to raise ImportError
    orig_import = __builtins__["__import__"] if isinstance(__builtins__, dict) else __builtins__.__import__
    def import_raise(name, *args, **kwargs):
        if name == "boto3":
            raise ImportError()
        return orig_import(name, *args, **kwargs)
    if isinstance(__builtins__, dict):
        __builtins__["__import__"] = import_raise
    else:
        __builtins__.__import__ = import_raise
    try:
        with pytest.raises(ValueError) as excinfo:
            AmazonBedrockEmbeddingFunction.build_from_config(config)
    finally:
        # Restore import
        if isinstance(__builtins__, dict):
            __builtins__["__import__"] = orig_import
        else:
            __builtins__.__import__ = orig_import

def test_kwargs_empty_dict(dummy_boto3):
    """Test with explicitly empty kwargs dict."""
    config = {
        "model_name": "amazon.titan-embed-text-v1",
        "kwargs": {}
    }
    codeflash_output = AmazonBedrockEmbeddingFunction.build_from_config(config); ef = codeflash_output # 5.55μs -> 5.87μs (5.49% slower)

def test_extra_keys_in_config(dummy_boto3):
    """Test with extra unrelated keys in config (should be ignored)."""
    config = {
        "model_name": "amazon.titan-embed-text-v1",
        "session_args": {"region_name": "us-east-1"},
        "kwargs": {"foo": "bar"},
        "extra1": "ignoreme",
        "extra2": 123
    }
    codeflash_output = AmazonBedrockEmbeddingFunction.build_from_config(config); ef = codeflash_output # 5.99μs -> 6.08μs (1.51% slower)

# 3. Large Scale Test Cases

def test_large_scale_kwargs(dummy_boto3):
    """Test with a large number of primitive kwargs."""
    large_kwargs = {f"key_{i}": i for i in range(1000)}
    config = {
        "model_name": "amazon.titan-embed-text-v1",
        "kwargs": large_kwargs
    }
    codeflash_output = AmazonBedrockEmbeddingFunction.build_from_config(config); ef = codeflash_output # 250μs -> 171μs (46.0% faster)



#------------------------------------------------
from chromadb.utils.embedding_functions.amazon_bedrock_embedding_function import AmazonBedrockEmbeddingFunction
import pytest

def test_AmazonBedrockEmbeddingFunction_build_from_config():
    with pytest.raises(ValueError, match='The\\ boto3\\ python\\ package\\ is\\ not\\ installed\\.\\ Please\\ install\\ it\\ with\\ `pip\\ install\\ boto3`'):
        AmazonBedrockEmbeddingFunction.build_from_config({})
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_aqrniplu/tmprus2lwk1/test_concolic_coverage.py::test_AmazonBedrockEmbeddingFunction_build_from_config 107μs 107μs -0.354%⚠️

To edit these changes git checkout codeflash/optimize-AmazonBedrockEmbeddingFunction.build_from_config-mh1l8qrq and push.

Codeflash

The optimized code achieves a **17% speedup** through several micro-optimizations that reduce Python bytecode operations:

**Key optimizations:**

1. **Faster attribute access**: Replaced `hasattr(session, attr) and session.attr` with `getattr(session, attr, None)` for session args extraction. This eliminates duplicate attribute lookups and reduces opcodes.

2. **Optimized type checking**: Added a `type(value) not in _primitive_types` check before the more expensive `isinstance()` call. For primitive types (the common case), this avoids the overhead of `isinstance()` entirely.

3. **Simplified session creation**: Collapsed the if-else branch for session creation into a single conditional expression `boto3.Session(**session_args) if session_args is not None else boto3.Session()`, eliminating redundant branching.

4. **Import reorganization**: Moved imports to follow PEP 8 style (stdlib before third-party), though this has minimal performance impact.

**Performance characteristics from tests:**
- **Large-scale kwargs** (1000 items): **46% faster** - the type checking optimization has significant impact with many kwargs
- **Mixed primitive types**: **9.87% faster** - benefits from the optimized type checking
- **Basic cases**: 3-4% improvements from reduced attribute lookups
- Some edge cases show minor regressions (5-8%) due to the additional tuple lookup, but these are outweighed by gains in common scenarios

The optimizations are most effective when dealing with many kwargs or repeated calls, as the reduced per-item overhead compounds.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 22, 2025 06:04
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants